Log Metrics : Hyne v KPP

I supplied Stan with parts of the Hyne dataset. He took a look at it it and wrote a brief report. These are my comments on his report.

  1. I have no problem with splitting the dataset along some dimension (e.g. LED or wobble) and fitting different models to each pool. I also have no problem with treating the Hyne and KPP datastes as completely distinct. However, as soon as you start comparing models across datasets it seems to me that they should be treated as a single set. Even if intellectually defensible, there is also the practical issue of having to fit a different model for each mill-resource combination.

  2. Stan would probably counter the above by saying that the shape data is distinctly different hence the models should be expected to differ. If only the coefficients differed then this would be easy to accept. However the models include entirely different predictors. So lets dig a little further into if (and by how much) the shape data from KPP and Hyne differs.

Further, if relationships really exist that are obscured by treating all observations in a single pool, then I’d expect different re-poolings to consistently yield similar models. If that didn’t happen, then it makes it look like luck.

  1. Why didn’t i get as good results as Stan did when I used his KPP methodolgy on the Hyne crook data (sans stem related measures)?

  2. Stan makes the claim that the models are of sufficient predictive power to be of significant utility. Would be good to quantify/clarify.

Other minor things:

Compare Shape Profiles

Stan claims that the Hyne logs smaller, less butt-log-like and rougher than KPP. Figures 3a and 3b are presented in support. These figures though:

  • show only the first 6 logs from each dataset,
  • show area not diameter (exaggerating roughness),
  • and do not have a common y-scale.
my.read.csv <- function(xfn,lfn,dset,logID) {
  X=read.csv(xfn)
  L=read.csv(lfn)
  X <- merge(X,L,by=logID)
  X <- X[order(X[,logID], X$DistanceFromLE),]
  X$dset <- dset
  colnames(X)[colnames(X)==logID] <- "LogNumber"
  return(X)
  } 
K=my.read.csv('/home/harrinjj/Desktop/Dropbox/4stan/stemshape.csv',
              '/home/harrinjj/Desktop/Dropbox/4stan/logs.csv',
              'LS15:KPP','ScionLogNumber')
# reproduce Stan Fig3a
xyplot(Area ~ DistanceFromLE, K, group=LogNumber, subset=LogNumber%in%c(501,505,510,511,512,517)&DistanceFromLE<4800,type='b')

H=my.read.csv('/home/harrinjj/Desktop/Dropbox/4stan/hyne-xsectns.csv',
              '/home/harrinjj/Desktop/Dropbox/4stan/hyne-logs.csv',
              'LS16:Hyne','SWILogNumber')
# reproduce Stan Fig3b
xyplot(Area ~ DistanceFromLE, H, group=LogNumber, subset=LogNumber%in%c(100,101,102,103,104,105)&DistanceFromLE>0,type='b')

# replot stan's figures on common y-scale
common.cols <- intersect(colnames(H),colnames(K))
X <- rbind(K[,common.cols],H[,common.cols])
xyplot(Area ~ DistanceFromLE | dset, X, group=LogNumber, subset=LogNumber%in%c(100,101,102,103,104,105,501,505,510,511,512,517)&DistanceFromLE>0&DistanceFromLE<4800,type='b')

# plot all log profiles
xyplot(Area ~ DistanceFromLE | dset, X, group=LogNumber, subset=DistanceFromLE>0&DistanceFromLE<4800,type='l')

# plot diameter rather than area
xyplot(Diameter ~ DistanceFromLE | dset, X, group=LogNumber, subset=DistanceFromLE>0&DistanceFromLE<4800,type='l')

# plot diameter rather than area
xyplot(Diameter ~ DistanceFromLE | equal.count(Whorliness,4)*dset, X, group=LogNumber, subset=DistanceFromLE>0&DistanceFromLE<4800,type='l')

my.read.csv <- function(fn,dset,logID) {
  X=read.csv(fn)
  X$dset <- dset
  colnames(X)[colnames(X)==logID] <- "LogNumber"
  return(X)
  } 
H=my.read.csv('/home/harrinjj/Desktop/Dropbox/4stan/hyne-logs.csv','LS16:Hyne','SWILogNumber')
K=my.read.csv('/home/harrinjj/Desktop/Dropbox/4stan/logs.csv','LS15:KPP','ScionLogNumber')
K=K[!is.na(K$LogNumber),]
common.cols <- intersect(colnames(H),colnames(K))
L <- rbind(K[,common.cols],H[,common.cols])
densityplot(~Whorliness, group=dset, L, auto.key=TRUE)

# plot diameter vs height sorted by whorliness
plot(-1000,-1000,xlim=c(0,5000), ylim=c(0,2000))
k = 0
off = 0 
for (logID in L$LogNumber[order(L$Whorliness)]) {
  if (k%%7==0) {
    ii=X$LogNumber==logID&X$DistanceFromLE<4800
    if (X$dset[ii][1]=='LS15:KPP') {
      col='red'
    } else {
      col='blue'
    }
    lines(X$DistanceFromLE[ii],X$Diameter[ii]-min(X$Diameter[ii])+off,col=col)
    off = off + max(X$Diameter[ii]) - min(X$Diameter[ii])
    }
  k = k + 1
}

Whorliness is possibly not the best indicator of roughness due to it being influenced both by the ‘smooth’ deviation from a quadratic as well as local ‘roughness’.

In earlier datasets a quadratic fit seemed good enough to capture differences between butt and upper logs, but for logs with substantial butt flare maybe not. In such cases whorliness conflates butt flare, nodal swelling and local roughness (incl measurment error, debark damage, etc).

Perhaps we need a new roughness metric? Such could be based on:

  • deviation from running mean/median/polyfit/…
  • fft coefficients
  • wavelets

Ideally this new measure should be insensitive to scanner resolution.

Compare Log Summary Metrics

# get KPP metrics
library(RODBC)
ch=odbcConnect('LS15')
K=sqlQuery(ch,'select  * from logs where ScionLogNumber is not null and velocity>0')
K$dset='LS15:KPP'

ch=odbcConnect('Hyne',uid='sa',pwd="password12")
H =sqlQuery(ch, "select * from logs where SWILogNumber is not null and DateAndTime>'2014-08-30' and velocity>0") 
H$dset='LS16:Hyne'

cols.to.compare = setdiff(intersect(names(H),names(K)),c('m_fail','m_nslices','m_dzmax','m_dzmin','m_led','m_sed','m_a0','m_a1','m_a2'))
L=rbind(K[,cols.to.compare],H[,cols.to.compare])
splom(L[,3:12], group=L$dset, auto.key=TRUE)
## Warning: closing unused RODBC handle 1

There are certainly aspects in chich the KPP and Hyne logs clouds differ markedly (e.g. LED, waist) and others where quite similar (e.g. ovality, sweep)

In the comparison above, the log metrics used are for the log as selected rather than as processed, i.e. the metrics computed as the original log (or stem at KPP) first passed through the scanner.

TODO: switch to using log metrics recomputed from bin files for KPP and Hyne (ELI rather than ROY?)

densityplot(~LED, group=dset, L, auto.key=TRUE)

densityplot(~SED, group=dset, L, auto.key=TRUE)

densityplot(~m_waist, group=dset, L, auto.key=TRUE)

densityplot(~m_whorliness, group=dset, L, auto.key=TRUE)

Compare Warp Distributions

Stan compares average absolute measures. What about distributions?

Include only full length boards from KPP in the comparision, then no need to consider normalized warp.

# KPP
ch=odbcConnect('LS15')
B.kpp=sqlQuery(ch,'select barcode as ID, Lwarp as length, abs(bowmid) as bow, abs(crookmid) as crook, -twisttot as twist from boards_logVlumber where Lwarp>4.6')
B.kpp$dset='LS15:KPP'
# Hyne
ch=odbcConnect('Hyne',uid='sa',pwd="password12")
B.hyne = sqlQuery(ch, "select flitchId as ID, 4.8 as length, bow, crook, twist  from manualMeas")
B.hyne$dset='LS16:Hyne'
# merge
B=rbind(B.kpp,B.hyne)
# replicate stan 
B.summ = data.frame()
for (dset in unique(B$dset)) {
  for (p in c('bow','crook','twist')) {
    ii = B$dset==dset
  B.summ = rbind(B.summ, data.frame(dset=dset, meas=p, val=mean(abs(B[ii,p]), na.rm=TRUE)))
  }
}
barchart(val ~ dset, group=meas, B.summ, auto.key=TRUE, ylim=c(0,12))

# yep, looks pretty similar except for twist. Perhaps Stan used signed rather than absolute?

# compare distributions
densityplot(~bow, group=dset, B, auto.key=TRUE)

densityplot(~crook, group=dset, B, auto.key=TRUE)

densityplot(~twist, group=dset, B, auto.key=TRUE)

While warp averages differ, the bow and crook distributions from KPP and Hyne studies are remarkably similar. Twist is a little different with considerably more moderate RH twist and less severe LH twist in the KPP dataset.

Why don’t I get the same average twist as Stan?

Hierarchical Log Clustering

Do KPP/Hyne logs cluster?

TODO